Mon. Not. R. Astron. Soc. 000, 000-000 (0000) 



Printed 6 March 2013 



(MN MftX style file v2.2) 



Impact of systematic uncertainties in N-body simulations 
on the precision cosmology from galaxy clustering 



Hao-Yi Wu* and Dragan Hutererf 

Department of Physics, University of Michigan, 450 Church St, Ann Arbor, MI 48109-1040 



o 

(N 

s 

o 

u 

Oh 

O 
c3 



> 

in 
m 
oo 
q 

o 



6 March 2013 



X 

S-H 



ABSTRACT 

Dark matter N-body simulations provide a powerful tool to model the clustering of 
galaxies and help interpret the results of galaxy redshift surveys. However, the galaxy 
properties predicted from N-body simulations are not necessarily representative of 
the observed galaxy populations; for example, theoretical uncertainties arise from the 
absence of baryons in N-body simulations. In this work, we assess how the uncertain- 
ties in N-body simulations impact the cosmological parameters inferred from galaxy 
redshift surveys. Applying the halo model framework, we find that the velocity bias 
of galaxies in modeling the redshift-space distortions is likely to be the predominant 
source of systematic bias. For a deep, wide survey like BigBOSS, current 10 per cent 
uncertainties in the velocity bias limit fc max to 0.14 /iMpc -1 . In contrast, we find that 
the uncertainties related to the density profiles and the galaxy occupation statistics 
lead to relatively insignificant systematic biases. Therefore, the ability to calibrate the 
velocity bias accurately - from observations as well as simulations - will likely set 
the ultimate limit on the smallest length scale that can be used to infer cosmological 
information from galaxy clustering. 

Key words: cosmology: large-scale structure of Universe - cosmological parameters 
- dark matter - dark energy 



1 INTRODUCTION 

The large-scale distribution of galaxies has been probing the 
structure and composition of the universe for over three 
decades. From the pioneering analyses of the Lick cata- 



> log (Groth & Peebles 19771 and the CfA Redshift Survey 



(Huchra et al. 1983 Geller & Huchra 19891 revealing the 



cosmic web, the APM Galaxy Survey hinting the depar- 



et al 



ture from the standard cold dark matter model (Maddox 
et al.|1990 l, to the subsequent 2dF Galaxy Redshift Survey 
IColless et al.||2001h, the Sloan Digital Sky Survey (lYork 



tering is the baryon acoustic oscillations (BAO), originating 
from the waves in the primordial electron-photon plasma 
before the recombination. The sound horizon at the end of 
recombination is manifested as a peak in the real-space two- 
point correlation function or as wiggles in the Fourier-space 
power spectrum. This characteristic scale of BAO is con- 
sidered as a standard ruler of the different evolution stages 
of the universe, and as a dark energy probe with relatively 



et al 



2000), and the VIMOS-VLT Deep Survey ( |Le Fevre 
2005 ) , galaxy redshift surveys have revolutionized the 
view of the large scale structure of the universe. Recently, 
the WiggleZ Dark Energy Survey ( Drinkwater et aIT||2010 1 
and the SDSS-III Baryon Oscillation Spectroscopic Survey 
(BOSS; Schleg eTet al.|2009 | have measured the galaxy clus- 
tering to unprecedented precision and provided stringent 
constraints on the cosmological parameters. 

One of the most important features in the galaxy clus- 
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well-controlled systematics (Blake & Glazebrook 2003 Seo 


& Eisenstein|2003 1. Indeed, since its discovery ( 


Miller et al. 


2001 Eisenstein et al. 2005 Cole et al. 2005 


1, BAO has 



been providing ever improving constraints on cosmological 
parameters (e.g., Percival et al.|20"To Blake et al.|201l" An- 



derson et al.|2012 | 



Beyond the BAO feature, the full scale-dependence of 
the clustering of galaxies contains much more information 



and can be used to constrain cosmology (e.g., Tegmark et al. 
[20061 |Reid et al.|[20To] |Tinker et aLl|2ul2l |Cacciato et~ 
2012[ ) and the halo occupation statistics (e.g., Abazajian 
et al.|[2005l |Tinker et al.|[2005] |van den Bosch et al.||2007[ 
Zheng fc Weinberg|2007| |Zehavi et al.|2011[ ). From the per- 



spective of power spectrum P(k), the number of modes in- 
creases as k 3 , and the information content increases dramat- 
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ically as one goes to smaller scales. However, when one tries 
to draw information from high k, especially at low redshift, 
the density perturbations become non-linear and difficult to 



model (e.g., Smith et al. 2003 Heitmann et al. 2010 Jen 



nines et al.|2011[ ), which can introduce significant systematic 



errors in the recovered cosmological parameters 

The analysis of galaxy clustering often relies on N-body 
simulations and synthetic galaxy catalogs to model the non- 
linearity on small scales, as well as to estimate the cosmic 
and sample covariances. For example, the WiggleZ team 
has validated their model for non-lin ear galaxy power spec - 
trum using the GiggleZ Simulation^] (Parkinson et al. 20121, 
while synthetic galaxy catalogs based on the Large-Suite 
Dark Matter Simulations (LasDamaQ have been used in 
the galaxy clustering analysis of SDSS ( jChuang fc Wang 
2012} |Xu et"a!T1[2012| | . 

For upcoming surveys, synthetic catalogs generated 
from N-body simulations will likely be routinely used to 
calibrate galaxy surveys. However, N-body simulations are 
not free from systematics. In N-body simulations, galaxies 
are assigned to haloes or dark matter particles based on 



models like halo occupation distribution (Peacock & Smith 



2000 Scoccimarro et al. 2001 Berlind fc Weinberg 20021, 



2004 


), or semi-analytic models l 


White & Frenk|1991 


Kauff- 


mann et al.||1993| |Somerville & Primack||1999| |Cole et al. 


2000 


I. The galaxy populations predicted by simulations can 



be affected by intensive stripping in dense environment (e.g., 



Wetzel & White 20101 and the absence of baryons (e.g., 
Weinberg et al.|2008||Simha et al.|2012[ ). On the other hand, 
when one uses dark matter particles to model the behavior 
of galaxies, systematic errors may arise because the positions 
and velocities of galaxies do not necessarily follow those of 
dark matter particles (e.g., Wu et al.|2012 I. 

In addition, it has been shown that galaxies predicted 
from N-body simulations cannot recover the spatial distribu- 
tion of observed galaxies. For example, Wu et al. (in prepa- 
ration) have shown that in high-resolution N-body simula- 
tions of galaxy clusters, subhaloes tend to be prematurely 
destroyed and fail to predict the location of galaxies (also 
see Appendix |A| . The need to include "orphan galaxies" 
(galaxies not associated with subhaloes in simulations) to 
improve the completeness of predicted galaxies has been fre- 
quently addressed in the community (e.g., Gap et aL||2004 



Wang et al. 2006 Guo et al. 20111; however, even includ- 



ing orphan galaxies does not lead to consistent galaxy clus- 
tering at all scales . For example, Guo et al. (20111 have 



shown that the galaxy population generated using the semi- 
analytic model applied to the Millennium Simulations over- 



estimates the small scale clustering (also see Contreras et al. 
20131. 



In this paper, we examine the impact of the systemat- 
ics in N-body simulations on the predictions of galaxy clus- 
tering. We calculate the galaxy power spectrum based on 
the halo model, with inputs from the results of recent N- 
body simulations. We use the information of the full power 

1 http: //tao . it . swin.edu. au/partner-re sources/ simulations/ 
gigglez/ 

2 http: //lss .phy . vanderbilt . edu/lasdamas/ 



spectrum of galaxies to forecast the cosmological parameter 
constraints and determine at which scale these systematics 
start to become relevant. We specifically explore how these 
uncertainties will limit our ability to utilize the cosmological 
information from small scale. 

This paper is organized as follows. In Section [2] we re- 
view the halo model prediction for galaxy power spectrum. 
In Section [3] we present our fiducial assumptions and dis- 
cuss the information content associated with P{k). Section 
[4]explores the self-calibration of HOD parameters. Section[5] 
addresses the impact of various systematics in N-body sim- 
ulations and presents the required control of these sources 
of systematic error. We conclude in Section [6] In Appendix 
[A} we present the galaxy number density profile model used 
in this work. In Appendix |Bj we provide detailed derivation 
of the galaxy power spectrum based on the halo model. In 
Appendix [Cl we derive the the power spectrum covariance. 



2 HALO MODEL AND GALAXY POWER 
SPECTRUM: A REVIEW 

Throughout this work, we use the power spectrum of galax- 
ies P(k) as our clustering statistic. Possible alternatives in- 
clude the three-dimensional correlation function £(r), its 
two-dimensional analogue - the angular two-point function 
w(8), or the projected two-point function w p (r p ). While the 
Fourier-space power is more difficult to measure from the 
galaxy distribution, it is "closest to theory" in the sense that 
the other aforementioned quantities are weighted integrals 
over P(k). Therefore, it is easiest to see the effect of the un- 
certainties in theoretical modeling by using the power spec- 
trum. While these different functions measured in a given 
galaxy survey contain the same information in principle, in 
data analysis sometimes discrepancies occur (e.g., Anderson 
et al.|2012 i. 



2.1 Basic model 

In this section, we provide the key equations of the galaxy 
power spectrum derived from the halo model, following 



Scherrer fc Bertschinger (19911, Seljak (20001, and Cooray 



(2002). The detailed derivation is provided in Ap- 



fc Sheth l 
pendix[B] 

The halo model assumes that all galaxies are inside 
dark matter haloes. To model the distribution of galaxies, 
we need: 

(i) Statistics and spatial distributions of dark matter 
haloes: 

• Halo mass function, dn/dM, the number density of 
haloes as a function of halo mass. 

• Halo bias, b\M) = P hh (k)/ Pi in (fc), where Phh is the 
power spectrum of haloes and Pn n is the linear matter 
power spectrum. We limit our use of b(M) to large scales 
where b(M) is scale-independent. 

(ii) Statistics and spatial distribution of galaxies in a halo: 

• Halo occupation distribution function, P(N\M), the 
probability distribution function of the number of galaxies 
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in a halo of a given mass. The number of galaxies N is 
further split into the contribution from central galaxies 
JV ccn (0 or 1) and from satellite galaxies iVsat- 

• Galaxy number density profile, u(r\M), the radial de- 
pendence of galaxy number density inside a halo of a given 

mass. We normalize u such that J u(r\M)d 3 r — 1. We 

also use the density profile in Fourier space, u(k\M) — 



d 3 xu(x|M)e 



and u — ¥ 1 for small k. 



n ga i = (n ga i) = j 



(1) 



The mean galaxy number density is given by 

The power spectrum is contributed by two galaxies in two 
different haloes (the 2-halo term, Pgg) and two galaxies in 
the same halo (the 1-halo term, Pf v 



33 ■ 



P(k) 
P33{k) 



33 
1 



"gal 



dM^(N\M)b(M) 



Plin(fc) 



dM 



dn 
dM 



MJ f(k\M) 



(2) 
(3) 

(4) 



In the 1-halo term, 



((*> 



/(A- 1 W) = [{N Bat \M}u(k\M) 
1 



1)|M) \u{k\M)Y 
which takes into account the contribution from central- 



+ ^ (N S a.t(N Si 



satellite and satellite-satellite pairs (Berlind & Weinberg 
[2002] ). 

2.2 Redshift-space distortions 

In observations, one cannot recover the exact three- 
dimensional spatial distribution of galaxies, because the red- 
shifts of galaxies are impacted by their motions due to the lo- 
cal gravitational field and do not reflect their true distances. 
On larger scales, galaxies tend to move toward high-density 
regions along filaments, and these motions tend to squash 
the galaxy distribution along the line-of-sight and boost the 



clustering, a phenomenon known as the Kaiser effect ( Kaiser 



19871. On small scales, the virial motions of galaxies inside 



a halo tend to make the galaxy distribution in the redshift 
space elongated along the line-of-sight, causing the so-called 
"Fingers-of-God" effect and reducing the small-scale power. 
In this section, we briefly describe the model we use for 



the redshift-space distortions (RSD) for P(k), following Sel- 



jak|pCMt , |Whit^p00l| ), and |Cooray fc Sheth|p002| . We 

adopt one of the simplified models - assuming the velocity 
distribution function to be Gaussian - and note that the im- 
provement of the RSD model is currently an active research 
area. 

Since the 1-halo term involves the halo scale, we only 
consider the virial motions of galaxies inside a halo, which 
can be modeled as (Pe acock|1999 1 

-i[fc CTv (A/) M ] 2 



where a v (M) is the velocity dispersion of galaxies inside a 
halo of mass M , and n = k • f . We average over fi to obtain 
the angular-averaged 1-halo term 



dM 



dn 
dM 



M f R (k\M) , (6) 



where 

N 
2 



M )f R (k\M) = [{N*t\M) u(k\M)Ri(M) 



The factor 



+ 2<iv sat (iv; 



R P (M) 



1)|M) \u{k\M)\ 2 R 2 {M) 
^Ferf[fea v (M)vW2] 



comes from averaging over jx. 

For the 2-halo term, we multiply the large-scale and 
small-scale effects together (see Peacock|1999 and section 4 
in |Peacock fc Dodds||T9"9"4"| ) 



5| al (k) = 5 gal (k) + /(n M )Mk)M e~ 3 



-l[fc CTv (M)/a] 2 



(8) 



The first part is the familiar Kaiser result with /(Qm) = 
dlnD/dlna, where D(a) is the linear growth function of 
density fluctuations and a is the scale factor. We note that 
8 m is the density fluctuation of dark matter. The calcula- 
tion thus includes not only the galaxy power spectrum, but 
also the matter power spectrum and the matter-galaxy cross 
power spectrum. After averaging over fi, we obtain 



P33W 



where 



F g (k) 



1 

"■gal 



Pi + \P 9 Fv 



dn 



5 v 



dM jM ( N \ M } K M )Ri( M ) 



(9) 



(10) 



<5gai(k) = <5 g ai(k)e~ 



(5) 



comes from the contribution of <5 ga i, and 

F v (k) = f(Chs)~ [ dM-^-Mb(M)R 1 (M) (11) 
p J dM 

comes from the contribution of 5 m - We note that u m (k\M) 
is the dark matter density profile normalized the same way 
as u. We assume that u m (fc|M) follows the Navarro-Frenk- 
White profile ( [Navarro et al.|1997[ ) throughout the paper. 

The left panel of Fig. [1] shows an example of the con- 
tribution to the total galaxy power spectrum by the 1-halo 
(blue) and 2-halo (red) terms. The input of halo model will 
be detailed in Section ETTl The solid and dashed curves corre- 
spond to including and excluding the effect of redshift-space 
distortions. As can be seen, including RSD significantly re- 
duces the power at small scale. We also note that the scale 
where 1- and 2-halo terms cross shifts very slightly due to 
RSD. 

The right panel of Fig. [T] presents the comparison be- 
tween our model and one of the power spectra from the 
WiggleZ survey, provided by Parkins on et al.| p012). The 
green dashed/blue solid curve corresponds to the theoreti- 
cal P(k) before/after convolving with the window function 
of WiggleZ. We assume that the HOD is described by the 5 
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k Ih/Mpcl k Ih/Mpcl 

Figure 1. Galaxy power spectrum calculated based on the halo model. Left: Blue and red curves show the 1- and 2-halo terms, 
respectively. Solid curves include the redshift-space distortions, while the dashed curves do not. Redshift-space distortions greatly reduce 
the power spectrum at small scale (the "Fingers-of-God" effect), and only slightly shift the scale where 1- and 2-halo terms cross. Flight: 
Our model fit to WiggleZ data from [Parkinson et al.| | |2012[ l. The blue solid curve shows the theoretical P(k) with the best-fit HOD 
parameters and has been convolved with the observational window function. 



parameters in Equation (271; we fit for these 5 parameters 



and show the model corresponding to the best-fit param- 
eters. This figure is only for the purposes of illustration; 
details of the fitting procedure will be presented in a future 
paper. 



3 BASELINE MODEL AND FIDUCIAL DARK 
ENERGY CONSTRAINTS 

In this section, we describe our inputs for the halo model, 
assumptions about the survey, predictions for the galaxy 
power spectrum, and Fisher matrix calculations of the sta- 
tistical and systematic errors. 



3.1 Baseline assumptions 

We use the virial mass M v i r of dark matter haloes through- 
out this work and adopt the following functions in our halo 
model calculations: 

• Mass function (dn/dM) and halo bias (b(M,z)): based 
on the fitting functions in Tinker et al. (2008 20101, which 



are derived from N-body simulations and can achieve ap- 
proximately 5 per cent accuracy for mass function and 6 per 
cent for halo bias. 

• Density profile: based on the universal Navarro-Frenk- 



White (NFW) profile (Navarro et al. 19971, which is de- 



scribed by one concentration parameter c v 
unrw(r\M v i T ) 

Cvir(M vir ) 



{r/r 3 ){l+r/r 3 y 



(12) 



• Concentration-mass relation: based on the relation in 
Bhattach arya et al.| ( |20lTj ), which will be further discussed 
in Section |5.1 1 In the presence of significant scatter in the 
c-M relation, we perform the integration 

u(r\M viv ) = / dcvir-P(c vir |M vir )w(r|M vir (c vir )) . (13) 



Throughout this paper, we assume that c v h- has a Gaussian 
distribution for a given M v j r with a scatter of 0.33, based on 



the finding of Bhattacharya et al. (2011 \ 



• Velocity dispersion: based on the scaling relation be- 
tween dark matter velocity dispersion and halo mass from 



Evrard et al.| | |2008[ | 

cr° M = 1082.9 



h(z)M 2 . 



km s 



(14) 



1O 15 M 

We convert the mass M200 to M v i T based on Hu & Kr avtsovl 
(20031. Since the scatter in the velocity dispersion is ex- 



pected to be small (4 per cent), it is not included in our 
calculation. 

• Halo occupation distribution: based on the parameteri- 



zation from Zheng et al. ( 2005 ) and the fiducial parameters 
from Coupon et al. ( [ 2012 1 , both of which will be discussed 
in detail in Section [4] 

We assume a fiducial galaxy survey covering / s k y = 1/3 
of the full sky (about 14,000 square degrees), similar to the 
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BigBOSS experiment We assume that the survey depth 
is comparable to the CFHTLS results presented in [Coupon] 
et al. (20121; specifically, we assume 5 redshift bins between 



0.2 < z < 1.2, and the limiting magnitude in each bin is sum- 
marized in Table [T] We assume no uncertainties in the red- 
shift measurements of galaxies. Given that the assumption 
of such a deep, wide spectroscopic survey may be somewhat 
optimistic, our required control of systematic errors may be 
somewhat more stringent than what BigBOSS needs. 

We include seven cosmological parameters, whose fidu- 



cial values are based on the WMAP7 constraints (Ko- 



matsu et al. 20111: total matter density relative to criti- 



cal Om = 0.275; dark energy equation of state today and 
its variation with scale factor wo — — 1 and w a — re- 
spectively; physical baryon and matter densities Q.bh 2 = 



0.02255 and fi M /T 



0.1352; spectral index n s 



0.968; 



and the amplitude of primordial fluctuations A = As(k = 
0.002fe _1 Mpc) = 2.43 x 10" 9 . We assume a flat universe, 
thus dark energy density S7de = 1 — 



3.2 Likelihood function of P(k) and error 
forecasting 

Here we follow the derivations in |Scoccimarro et aL ( 1999 1 
and Cooray & Hu (2001 1 but use a different convention for 



Fourier transform (see Appendix |B|. If we assume a thin 
shell in In A; space with width Sink around lnfc^, the power 
spectrum estimator reads 



P(ki) = 



where 



V 3 {h) = 4?rfc?<Slnfc 



(15) 



(16) 



and l/n ga i accounts for the effect of shot noise. The first 
term of P is calculated based on the halo model results de- 
scribed in Section \2. 21 

The covariance of power spectrum is given by 



PikJPikj)) - (P(ki 
(2nf P[kif 



(17) 



V, V s {ki) 



where the second term indicates the connected, 
Gaussian term given by the trispectrum 



d 3 ki 



d J k 2 



ki Vs(kA J k V s {k 3 ) 



T(ki,-ki,k 2 ,-k 2 ) 



(18) 



We provide the detailed derivation in Appendix [C] In 
Equation \17\ V z is the volume of the redshift bin, V z = 

n s u rvC y J r 2 (z)/ H(z)dz, where the integral is performed 

over the redshift extent of the bin. 

The calculation of Ty involves 4-point statistics, which 



is non-trivial to calculate. Fortunately, Cooray & Hu (2001 1 
have shown that only the 1-halo term dominates at the scale 



http : //bigboss . lbl . gov/ 



where the contribution of Tij to dj is not negligible; there- 
fore, we only need to calculate the 1-halo contribution: 

dn 



T (ki,k 2 , k 3 , ki) = 



dM- 



dM 



M)f(k u k 2 ,k 3 ,k 4 ;M) 



(19) 



where 



+ 



M 

N sat 
3 

iVsat 

4 



\ f(k 1 ,k 2 ,k 3 ,k 4 ;M) 



M 



(n? =1 fi(fc i |Af) + eye.) (20) 



M ) Ui =1 u(ki\M) 



Analogous to the case of P gg considered in Section 2.1 



the first term accounts for quadruplets composed of 1 cen- 
tral and 3 satellite galaxies, and the second term accounts 
for the quadruplets composed of 4 satellite galaxies. We 
assume that P(N sat \M) follows the Poisson distribution 
so that |M) = (iV S at|A/) 3 /3! and ({ N l at )\M) = 

<iV sat |Af} 4 /4!. 

We employ the Fisher matrix formalism to forecast the 
statistical errors of the cosmological and nuisance parame- 
ters based on the fiducial survey. The Fisher matrix reads 



a/3 



\ ^ ^— v dPi 



(27T) 



V z 



P"} 

2nkfS\nk S ' j+T ^ 



-EE 



dlnP l 
d6 a 



(27T) 



da + 



V z 2Tvkf5lnk ' J Pi P., 



din P, 



00 



P 



where a and ft are indices of model parameters, while i and j 
refer to bins in wavenumber which have a constant logarith- 
mic width 5 In k and extend out to the maximum wavenum- 
ber fcmax- We adopt Sink = 0.1, which has been tested to 
be small enough to ensure convergence. The best achievable 
error in the parameter 6 a is given by 

ag a = [(F-% a ] 1/2 . (21) 

Throughout this work, unless otherwise indicated, the 
full set of parameters considered is given by 



0fuii = (wo, w a , ^de, Huh ,0. b h ,n s ,\nA\ 

logio Afmin, Cri ogl0 M , log 10 M , log 10 Ml, Q S at) 



(22) 



The first seven are the cosmological parameters introduced 
in Section [3.1| while the last five are the nuisance parameters 
describing the HOD and will discussed in Section [4. 1| 



3.3 Fiducial constraints without systematics 

To represent the statistical power of an upcoming galaxy 
redshift survey, we consider the inverse of the square root 
of the dark energy figure-of-merit, originally defined as the 
inverse of the forecasted 95 per cent area of the ellipse in the 
wo-Wa plane ( |Huterer fc Turner|200i] |Albrecht et"aIT2006) . 
In other words, our parameter of interest is \fa{w a )<y(wp), 
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3.4 Systematic bias in model parameters 




no RSD 

add RSD 

add Trispectrum 



10° 
[h/Mpc] 



Figure 2. Dark energy information content from P(k), based on 
our fiducial survey assumptions. The x-axis corresponds to the 
largest k (smallest scale) assumed to be reliably measured and 
interpreted. The blue curve corresponds to a Gaussian likelihood 
function and assumes no rcdshift-space distortions; it leads to 
unrealistically tight constraints for large fc max . The green curve 
includes rcdshift-space distortions, and the red curve further in- 
cludes the trispectrum correction to the covariance matrix. As 
can be seen, including these two effects reduces the small-scale 
information. 



In this work, we estimate the systematic shifts in parame- 
ter inference caused by using an inadequate model. In par- 
ticular, if we assume a problematic model that produces a 
power spectrum P sys (k) that systematically deviates from 
the truth Pad(k), we will obtain parameters that systemat- 
ically deviate from their true values: sys — 9ftd + AO. The 
systematic shifts in parameters can be obtained through a 
modified Fisher matrix formalism ( Knox et al.|1998 l: 

= £(*"- 



A0 a 



(23) 



where 



(2tt 



3 



Sij + 



V z 27rfc?<51nfc 13 PtP. 



din Pi 



(24) 



To determine the significance of systematic errors, we calcu- 
late the systematic shifts A^tot in the full high-dimensional 
parameter space, 



Ax?ot = A8 T FA8 , 



(25) 



where AO is the vector of the systematic shifts of parame- 
ters. Both AO and the Fisher matrix F include cosmological 
and nuisance parameters. The systematic bias is considered 
significant if the inferred r? ays lies outside the 68.3 per cent 
confidence interval of the Gaussian likelihood function cen- 
tered on #fid; in other words the bias is "greater than the 1-cr 
dispersion." For example, in a full 12-dimensional parame- 
ter space considered here, 68.3 per cent confidence interval 
corresponds to Axtot — 13.7. 



where w p is the pivot that physically corresponds to w(a) 
evaluated at the scale factor where the constraint is the 
best. This quantity takes into account the temporal vari- 
ation of dark energy, and the square root serves to com- 
pare it fairly to the constant w; the two quantities, a(w) 
and sj a(wo)<j(w p ), tend show very similar behavior. For 
our fiducial survey, the statistical error in our parameter 
combination of interest is \J a{w a )a{w p ) = 0.4 (or 0.004) 
for fc max = 0.1 (or 1) ftMpc -1 , without external priors. 
When we add the Planck Fisher matrix (W. Hu, private 
communication), \J o-(w a )a(w p ) becomes 0.002 (or 0.0002) 
for fcmax = 0.1 (or 1) /iMpc -1 . 

Fig. [2] presents the expected dark energy constraints as 
a function of fc max , without nuisance parameters or system- 
atic errors for the moment, for three levels of sophistication 
in the theory. We proceed in steps: the blue curve corre- 



sponds to no redshift-space distortions (Section 2.1 1 with 
a Gaussian likelihood function. In this case, the dark en- 
ergy constraints increase sharply with fc max , indicating that 
these assumptions are unrealistic. The green curve includes 
the redshift-space distortions (Section |2.2| ), which reduce the 
dark energy information from small scales. The red curve 
further includes the effect of non-Gaussian likelihood (Tij 
from Equation (18l), which reduces the information at high 
k even more. 



4 SELF-CALIBRATION OF HOD 
PARAMETERS 

In this section, we focus on the efficacy of self-calibrating 
the HOD parameters, that is, determining these parameters 
from the survey concurrently with cosmological parameters. 
Since these HOD parameters are not known a priori, one 
usually marginalizes over them along with cosmological pa- 



rameters (e.g., Tinker et al. 2012 I, which inevitably increases 



the uncertainties in cosmological parameters. Here we focus 
on the statistical uncertainties and assume no systematic 
error; in the next section, we will compare these statistical 
errors with systematic shifts of parameters. 

We focus on two parameterizations of HOD: one is 



based on Zheng et al. |2005), and the other is based on 
a piecewise continuous parameterization. 



4.1 Zheng et al. parameterization 

The halo occupation distribution describes the probability 
distribution of having N galaxies in a halo of mass M. 
In principle, the HOD is specified by the full distribution 
P(N\M); in practice, modeling of the two-point statistics 
only requires (jV ceft |M), (N sat \M), and (iV 5at ( jV aat - 1)\M) 
We follow the HOD parameterization from | Zheng et al.| 
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Redshift 


M g - 5 log 10 h 


log 10 M min 


logio Mi 


logio M o 


flog 10 M 


«sat 


0.2 < z < 0.4 


-17.8 


11.18 


12.53 


7.54 


0.40 


1.10 


0.4 < 2 < 0.6 


-18.8 


11.48 


12.66 


10.96 


0.43 


1.09 


0.6 < z < 0.8 


-19.8 


11.77 


12.83 


11.54 


0.50 


1.07 


0.8 < 2 < 1.0 


-20.8 


12.14 


13.21 


12.23 


0.35 


1.12 


1.0 < 2 < 1.2 


-21.8 


12.62 


13.79 


8.67 


0.30 


1.50 



Table 1. Fiducial values for the halo occupation distribution parameters, adapted from|Coupon et al.| (|2012|) based on CFHTLS. 



( |2005| ) , which separates the contribution from central and 
satellite galaxies: 



<7V ccn |M) = 



1 + erf 



logio M ~ lo glO M mir. 



°"log ln M 



{N nt \M) = (N cen \M) x 



M - Mp 
Mi 



(26) 
(27) 



The first equation describes the contribution from the cen- 
tral galaxy; M m i n corresponds to the threshold mass where 
a halo can start to host a galaxy that is observable to the 
survey, and cri ogl0 m describes the transition width of this 
threshold. The second equation describes the contribution 
from satellite galaxies, whose number is assumed to follow a 
power law, and Mo is the cutoff mass. In addition, we make 
the widely-adopted assumption that P(A r sat |M) follows a 
Poisson distribution, i.e., 



<AU(iVs a t - 1)| AO = (iNU) 5 



(28) 



We adopt the fiducial values from Coupon et al. (2012), 



which are constrained using the projected angular two-point 
correlation function w(6) from the Canada-France-Hawaii 
Telescope Legacy Survey (CFHTLS) out to z — 1.2. We use 
the same binning and limiting magnitude as in Coupon et al. 
( |2012[ ); the values are summarized in Table [T] We do not use 
the error bars quoted there as our priors because we would 
like all parameters to be self-calibrated consistently. 

Under these assumptions, we have 5 nuisance param- 
eters (log 10 M min , <Ti ogl0 M, log 10 M , log 10 Afi, a sat ) for 
each of the 5 redshift bins, i.e., 25 parameters in total. We 
assume that each of the 5 distinct nuisance parameters varies 
coherently across the 5 redshift bins, and is therefore de- 
scribed by a single parameter. Under this assumption, in- 
stead of 25 nuisance parameters, we only use 5 nuisance 
parameters to describe the uncertainties of all HOD param- 
eters. We parametrize the variations around the fiducial val- 
ues: 



Q HOD , /jHOD.fid 

i i = hid; 



(i = l,...,5) 



(29) 



where hi are the dimensionless parameters describing the 
uncertainties of the aforementioned 5 HOD parameters. 

We explore how well these parameters can be self- 
calibrated by P(k) without the aid of priors. Fig. [3] shows 
the dark energy constraints as a function of fe ma x, with fixed 
nuisance parameters (red) and with these 5 marginalized 
nuisance parameters (dark blue). The redshift-space distor- 
tion and the full covariances of P(k) are included in this 
calculation. Clearly, the dark energy constraints are weak- 



ly 

IT 



io :i 
io 2 

10 1 
10" 
IO" 1 

IO" 2 

IO" 3 
IO" 4 



Solid: no prior; dashed: Planck prior 



no nuisance parameters 

5 Zheng HOD parameters 

2 para for cen; 5 para for sat 



1-halo dominated 




10° 
[h/Mpc] 



Figure 3. Self-calibration of HOD parameters. We show the dark 
energy constraints as a function of the highest k used in the sur- 
vey. The red curve corresponds to no nuisance parameters. The 
dark blue curve corresponds to five nuisance parameters based on 
the parameterization in |Zheng et al.| (|2005), while the cyan curve 
corresponds to a piecewise continuous parameterization for satel- 
lite galaxies, with one parameter in each of the five mass bins. 
Including nuisance parameters in either parametrization system- 
atically increases the dark energy uncertainties by one or two 
orders of magnitude. The dashed curves include the Planck prior 
and assume the same nuisance parameters as their solid-curve 
counterparts. 



ened by approximately about 1 or 2 orders of magnitude 
when we marginalize over HOD parameters. 



4.2 Piecewise continuous parameterization of 
HOD parameters 

One potential worry with the parametrization in Equa- 
tion (27 1 is whether (jV sa t|M) is accurately described 
by a power law. To address this, we propose a less 
model-dependent, piecewise continuous parameterization for 
(iVsat|Af). We divide the halo mass range into Tibms bins and 
assign a parameter describing the uncertainties of HOD in 
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-■ 0.1 [h/Mpc] 



= 0.4 [h/Mpc] 




k„ax = 1.0 [h/Mpc] 



b 




no prior 

(T = 1 
(T = 0.1 

(To = 0.01 



Figure 4. Dark energy constraints with self-calibrated piccewise continuous HODs. The three panels correspond to femax — 0.1, 0.4, and 1 
hMpc" 1 . The x-axis corresponds to the number of parameters used to describe (N s at\M) per decade of mass, and the y-axis corresponds 
to the dark energy constraints. The black curve corresponds to no prior, and the constraints are degraded with larger number of 

parameters. The other curves correspond to consistently adding a fixed total prior per decade of mass; that is, aj i = o~o-J Np W decade > 

where ctq = 1, 0.1, or 0.01. We note that one parameter per decade is sufficient for fc max = 0.1 hMpc - , while two or three parameters 
arc needed for higher fc max . Note that for higher fe max , the HOD parameters arc better self-calibrated in the absence of prior. 



each bin. That is, 

"bins 

(iV sat |M) = £ e(Mi,M i+1 )fi{N ssl t\M) m , (30) 

i=l 

where 0(Mi, Mi+i) defines the binning and equals 1 in 
[Mi, Mi+i] and elsewhere, while fa is the free parameter in 
bin i and describes the uncertainty of (JV sa t|M) in this bin. 

We still assume P(N aa , t \M) to be a Poisson distribution, 
which now implies 

(AT sat (iV S at-l)|M) = (iV sat |M) 2 

"bins 

= @{Mi,M i+1 )f!{N Bat \M)l d . 

(31) 

We start with one parameter per decade in mass, using 
JT-bins = 5 parameters between 10 11 and 10 16 fc _1 M0, equally 
spaced in log 10 M. The cyan curve in Fig. [3] corresponds to 
marginalizing over these five piecewise continuous parame- 
ters for (AT aat |M> and two parameters (log 10 M min , <7i ogl0 M ) 
for (JV cen |Af}, with no prior on them. 

Fig. [4] shows the dependence of dark energy con- 
straints on the number of parameters describing (N sa ,t\M) 
per decade of mass, N peT dec a de- The three panels correspond 
to fc max = 0.1, 0.4, and 1 /iMpc -1 . The Planck prior is in- 
cluded in this calculation. The black curve corresponds to 
no prior on /j and shows strong degradation with increas- 
ing iVp er doc a dc as one would expect. When & max is small, 
the prior knowledge of HOD is important to improve the 
dark energy constraints. On the other hand, when fc max is 
large, HOD can be well self-calibrated, and the prior is not 
as important. 

To enable a fair comparison of priors, however, we would 
like to increase the freedom in the HOD model while fixing 



the overall uncertainty per decade. To do this, we impose a 
fixed prior per decade of mass: 

°7i ~ °"0\/ -/Vpcr doc a dc (32) 

so that the total prior per unit log 10 M, when we add the 
Fisher information from all fi , is ao regardless of the value 
of N p 

cr dcc a de ■ 

The red/green/blue curves in Fig. [4] correspond to im- 
posing cr = 1/0.1/0.01. For fc max = 0.1/iMpc" 1 , the dark 
energy constraints converge when we use one parameter per 
decade of mass regardless of the prior on nuisance param- 
eters. When fc max > 0.1/iMpc -1 , a few more parameters 
per decade in mass are required for the results to converge. 
For example, for fc max = 0.4 (TO) /iMpc -1 , we need 2 (3) 
parameters per decade to ensure convergence. The required 
number of parameters also somewhat depends on the prior. 



5 SYSTEMATIC ERRORS DUE TO 
THEORETICAL UNCERTAINTIES 

In this section, we explore the impact of four sources of 
theoretical uncertainties coming from N-body simulations on 
the constraining power of P(k). These sources of systematics 
are: 

• Concentration-mass relation 

• Deviation of u from the NFW profile 

• Deviation of Nsst from the Poisson distribution 

• Velocity bias 

In particular, we address the following points: 

• With the current level of uncertainties, what are the 
systematic errors in the prediction of P(k)7 What are the 
biases in the parameter inference caused by these systemat- 
ics? 
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• What is the smallest scale (largest fc max ) allowed by the 
current level of uncertainties? 

• What is the required reduction of these uncertainties if 
we would like to push to higher fc max ? 

We use Ax?ot — 13-7 (l-cr errors in a 12-dimensional param- 
eter space; see Equation 25 1 as our criterion of significant 
impact from systematic error. We calculate A^tot using the 



We compare it with the model from B01: 



Fisher matrix for 7 cosmological parameters (Section 3.1l 



and 5 HOD parameters (Section 4.1|. Throughout this sec- 
tion, we use the Planck prior but no priors on HOD param- 
eters. We believe these two assumptions reflect reality in 
the next 5-10 years, when Planck data will firmly pin down 
certain combinations of cosmological parameters, while the 
determination of the nuisance HOD quantities will still be in 
flux. We note that unbiased priors always decrease the result- 



ing systematic bias (for a proof, see appendix A of |Bernstein 
fe Huterer|2010 1 and make the theoretical requirements less 
stringent. Thus, any prior on HOD parameters will allevi- 
ate the systematic biases and make the required / aya less 
stringent. 

The summary of the impact of these systematics are 
presented in Fig. [5] and Table [2] 

5.1 Concentration— mass relation 

In the halo model, the 1-halo term depends on the num- 
ber density profile of galaxies, u(k\M). We assume that 
the galaxy distribution follows the dark matter distribution, 
which is well-described by an NFW profile. We then use the 
concentration-mass relation of dark matter haloes from the 
literature to compute u{k\M). 

The concentration-mass relation has been calibrated 



with dark matter N-body simulations (e.g. 


Bullock et al. 


2001 


|Neto et al. 2007 


Maccio et al.|2008||Duffy et al.|2008| 


Prada et al. |2012| B 


lattacharya et al. 2011 |Kwan et al.| 


2012 


1 and hydrodynamical simulations (e.g., 


Lau et al. 2009 


Duffy et al.||2010 Rasia et al.||2013l. Several observational 



programs are also working toward pinning down this relation 
(e.g., |Coeet al.|2012[ ). However, 10-20 per cent of uncertain- 
ties in the concentration-mass relation remain (see, e.g., the 
Bha ttacharya et al.|2011 1 . 

investigate the impact of uncertainties in the 
concentration-mass relation by comparing the models from 



review m 
We 



Bullock et al. 


(2001 B01 hereafter) and the recent cal 


ibration from 


Bhattacharya et al. (2011 Bll hereafter) 



These two models represent two extreme cases of the 
concentration-mass relation. As we will see later, since the 
c-M relation is not a dominant source of systematic error, 
using these two extreme cases sets the upper limit of the 
systematic bias caused by the c-M relation. We assume a 
scatter of 0.33 for the c-M relation in both cases. Ignoring 
this scatter will lead to approximately 0.5 per cent difference 
in P(k) at k w 1 /iMpc" 1 . 

Our baseline model is from the recent formula given by 
Bll (based on virial overdensity) : 



=(") = 



D(z)°' 78 7.9j/~ 



1.12 



(33) 



D{z) 



5 x lQis/i-iMr. 



+ 0.53 



c(M vir ) = 



9 



M vir 
1 + z V M* (z) 



(34) 



These two calibrations agree near M, at z = 0. 

The top left panel of Fig. [5] shows the relative change 
in the power spectrum P(k), evaluated at 5 redshifts, due 
to the difference between B01 and Bll. We find that P(k) 
based on B01 is in general lower than that based on Bll, 
because B01 predicts lower concentration at the high-mass 
end. Although B01 predicts higher concentrations at the 
low-mass end, these haloes rarely contribute to the 1-halo 
term and thus do not significantly boost clustering. 

The inset in this panel shows the systematic shifts in 
the parameter space caused by different models, which are 
characterized by A^tot- It can be seen that the system- 
atic error starts to be comparable to the statistical error 
(A^tot — 13.7, marked by a horizontal dashed line) at 
fcmax = 1-2 foMpc -1 , which makes it a relatively unimpor- 
tant source of systematic error. 

We would now like to study the effects of improved cal- 
ibration in the c-M relation. A natural way to do this is to 
assume that the difference between the two extreme predic- 
tions has been reduced by some constant factor, and that the 
new value interpolates between the two original extremes. 
We define the interpolated value as 

Cintorp (M) = Cfid (M) + / sys (Calt (M) - Cfld (A/)) , (35) 

where Cfld(M) and c a i t (M) are respectively the fiducial 
(say, Bll) and the alternate (say, B01) models for the 
concentration-mass relation. Here / sys is a tunable parame- 
ter that allows us to assess the effect of a fraction of the full 
systematics. The limiting cases are: 



/sys 

/sys 



no systematics 

1 fiducial systematics 



For a higher fc max , the tolerance of systematics is smaller, 
and /sys provides a measure for required reduction of sys- 
tematics. For a given fc max , we search for the appropriate 
/sys value that makes the systematic negligible]^] 

The blue curve in Fig. [6] shows the requirement on / sys 
from the c-M relation as a function of fc max . For all practical 
fc max values, c-M does not require more precise calibrations 
from N-body simulations. The results are summarized in the 
"c-M relation" row of Table [2] 



5.2 Galaxy number density profile: deviation from 
NFW 

Our fiducial model assumes that the galaxy distribution in- 
side a halo is described by the NFW profile. However, N- 
body simulations have shown that the distribution of sub- 
haloes in cluster-size haloes tend to be shallower than the 

4 Note that some fraction / S y S of the systematics does not triv- 
ially lead to the same fractional shift in P(k) because the c-M re- 
lation (and most other systematics) enters non-linearly into P(k). 
We therefore need to perform a separate calculation of P(k) for 
each / S ys. 
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Figure 5. Systematic differences in P(k) caused by the four sources of errors discussed in Section[5] In each panel, the main figure shows 
the fractional difference in P(k) in the five redshift bins, while the inset shows the systematic error Axfot 

as a function of /c max . The l-cr 



deviation in the f 2-dimensional parameter space, A% t 



13.7, is marked by the horizontal dashed line in each inset. 



NFW profile, and also shallower than the observed galaxy 



number density profile (e.g., Nagai & Kravtsov 2005 Die- 
mand et al.|2004 i. These deviations could be related to in- 
sufficient resolution or the absence of baryons in N-body 
simulations - the so-called "overmerging" issue. Several au- 
thors have proposed models for "orphan galaxies" to com- 
pensate the overmerging issue; however, these models do 



not always recover the observed galaxy clustering (e.g., Guo 



et al. 20111. The exact cause for these issues is still un- 
certain; nevertheless, the uncertainties associated with the 



distribution of subhaloes will likely impact the modeling of 
galaxy clustering. 

In this section, we investigate whether the uncertainties 
in the galaxy number density profile lead to a significant 
systematic bias. To model the possibility that the galaxy 
distribution is shallower than dark matter in the inner re- 
gion of clusters, we adopt the subhalo number density profile 
measured from Wu et al. (in preparation), which is also il- 
lustrated in Appendix [A] Fig. |A1| presents one example of 
the galaxy number density profile measured from an N-body 
simulation. Based on this result, we model the subhalo num- 
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Systematic 
Difference 


^max 

allowed 


fcmax = 0.3/lMpC 1 


fcmax = 1 h Mpc 1 


AP/P(z = 0.3) 


Deviation (<r) 


/sys req. 


AP/P(z = 0.3) 


Deviation (<r) 


/sys req. 


c-M relation 


Bll vs. B01 


1.2 


7.8c-05 





none 


0.0036 


0.012 


none 


Profile 


NFW vs. cored 


1 


0.00015 





none 


0.0067 


0.58 


none 


P(AT 8at ) 


a=l vs. 1.02 


0.27 


0.0037 


2.2 


0.78 


0.016 


17 


0.21 


Velocity bias 


b v =l vs. 1.15 


0.14 


0.026 


33 


0.11 


0.052 


108 


0.034 



Table 2. Summary of the effects of the four sources of systematic error considered in Section [5] Note that "/sys req." is the required 
reduction factor in the amplitude of the systematic difference so that it becomes a \-a effect in the full parameter space. 




Miiax 



10° 
[h/Mpc] 



Figure 6. Required reduction of systematic errors, shown as the 
fraction of the current errors, for the four sources of systematic er- 
rors discussed in this paper, as a function of maximum wavenum- 
ber considered in the survey. Note that the velocity bias requires 
the greatest improvement relative to the current knowledge. 



ber density profile as 

u{r\M) = f Burv (M, r) x u NFW (r\M) , (36) 

where unfw(?"| M) is the NFW profile , and / sur v is the "sur- 
viving fraction" of galaxies given by 

/ surv = 1 - 0.99e- a{r/R ^ 

M vir \ 2 (37) 



a^O.OOS^log^^ 

We note that / sur v is smaller for higher host halo mass and 
smaller radius, where the effect of overmerging is stronger. 

The top right panel of Fig. [5] shows the difference in 
P(k) caused by this cored profile. As expected, the deficit of 
galaxy number at small scales leads to lower power at high 
k. In addition, the suppression is stronger at low redshift 
because massive clusters are more abundant at low z. The 
inset shows the corresponding A^tot as a function of fc max ; 



We model the interpolated systematic error in density 
profile as 

UinterpCfclM) = Ufi d {k\M) + f sys [£i a i t (fcjAf) - U Rd (k\M)] . 

(38) 

We again search for the required / sys as a function of fc max . 
The result is shown by the green curve in Fig. [6] Like the 
c-M relation, the density profile of galaxies does not require 
more precise calibrations for all practical fc max . The results 
are summarized in the "Profile" row of Table H 



5.3 Deviation from Poisson distribution 



In our fiducial model, 
be Poisson-distributed; 
ment is given by (N s 

a = \/(N sa .t(N si 



Kolchin et al. (20101 



P(iV sa t|M) is assumed to 
that is, the second mo- 
t (JV«t-l)> = (A U) 2 , or 
l))/(A r sa t) = 1. However, Boylan- 
that the number of 



s a t / 

have shown 



subhaloes for a given halo mass deviates from the Poisson 
distribution (their fig. 8). In addition, Wu et al. (2012) 
have shown that the extra-Poisson scatter depends on 
how subhaloes are chosen and depends on the resolution. 
Therefore, it is still unclear whether P(N sa .t\M) follows a 
Poisson distribution. To assess the impact of the possible 
extra-Poisson scatter, we adopt a = 1.02 in our 1-halo term 
(following Boyl an-Kolchin et al.|2010 l, and noting that this 
choice of a brackets the various possibilities explored in|Wu| 



et al. (2012 fig. 3 therein). 



The bottom left panel in Fig. [5] shows the impact of 
a = 1.02 on P(fe), relative to the fiducial Poisson case 
with a = 1. The extra-Poisson scatter only impacts the 
1-halo term; therefore, the large-scale P(k) is unaffected. 
At small scales, P(k) is boosted by less than 3 per cent. 
For different redshifts, AP/P takes off at different k, re- 
flecting the varying scale where 1-halo and 2-halo terms 
cross. We also note that at high k, AP/P bends down- 
ward, reflecting the fact that the 1-halo term includes 

(N«t\M) u+\ (A sat (A sat - 1)|M) u 2 . When P(A sat |M) is 
super-Poisson, more galaxy pairs are expected, and the 1- 
halo term gets more weighting of u 2 (u < 1); thus, P{k) 
becomes lower at high k. 

The inset in the bottom left panel of Fig. [5] shows that 
the systematic shifts dominate statistical errors at fc max = 
0.3 ft Mpc -1 . We model the partial uncertainties in a as 



the systematic shifts dominate at k u 



1 ft Mpc 



Qinterp = Ctfld + /sys(«alt — «fld) 



(39) 
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The red curve in Fig.[6]shows the required / sys as a function 
of /tmaxj for femax = 1 ftMpc" 1 , the required / sys = 0.2. The 
results are summarized in the P(N sa ,t) row of Table [2] 

5.4 Velocity bias 

The small-scale redshift-space distortions (also known as the 
"Fingers-of-God" effect) are usually modeled as an exponen- 
tial suppression of power with the term e~' fc<TvM ' . Here cr v is 
the velocity dispersion of galaxies inside a cluster <Jv al (M v i r ). 
Assuming that the motions of galaxies trace those of dark 
matter particles, we use the velocity dispersion of dark mat- 
ter particles inside a halo, cr v (M v ir), which has been well- 
established using simulations ( Evrard et al.|2008| . However, 
the velocity dispersion of galaxies inside a cluster crj al is not 
necessarily the same as <r° . The the ratio between the two 
is defined as the velocity bias 



K = 



(40) 



The exact value of b v and its redshift dependence are still un- 
der debate. Subhaloes from N-body simulations have shown 
fe v > 1 (e.g., Colin et al. 20001. In addition, Wu et al. (in 
preparation) have shown that the exact value of b v depends 
on the selection criteria applied to subhaloes, on the resolu- 
tion of simulations, and on the location of subhaloes. On the 
other hand, simulated galaxy population based on assigning 



subhaloes to dark matter particles (e.g., Faltenbacher & Die- 



mand 2006 \ or based on hydrodynamical simulations with 
cooling and star formation (e.g., |Lau et al.||2010| |Munari| 
et al.|20l"3 1 tends to have unbiased velocities. 

Since this paper focuses on the possible systematics 
from N-body simulations, we adopt b v > 1 observed in N- 
body simulations. Based on the recent calibration from |MrT| 
(20131, we adopt the value of velocity bias to 



nari et al. 



be 



6 v (z) = 1.1 



2 

15 



(41) 



(estimated from the dotted curve in their fig. 7A, which 
corresponds to subhaloes in their N-body simulations.) The 
bottom right panel of Fig. [5] shows the systematic error in 
P(k) caused by this velocity bias. Introducing higher ve- 
locity dispersion of galaxies clearly leads to larger suppres- 
sion on small scales. Note that each curve showcases a dip 
near k « 1 ftMpc -1 , which roughly corresponds to the scale 
where 1-halo and 2-halo terms cross. As shown in Section 
2.2[ the exponential suppression of RSD enters the 1-halo 
and 2-halo terms differently; modifying the RSD will there- 
fore slightly change the scale of 1-halo to 2-halo transition. 
Also note that the shift in P(k) does not vanish even for very 
small k, because the exponential suppression enters the 2- 
halo term as well. 

The inset in the bottom right panel of Fig. [5] shows 
that the systematic shifts associated with velocity bias (dif- 
ference between no velocity bias and positive velocity bias) 
dominate the statistical error even for fc max = 0.14 h Mpc - . 
Because the deviation of P(k) starts at large scales and in- 
creases toward small scales, the velocity bias is dominant 
among the four sources of systematic errors studied in this 
paper. 



As before, we consider values of the velocity bias that 
interpolate between the two extreme values considered: 

kv> intcrp = &v, fi d + /sys (&v, alt — &v , fld ) , (42) 

where / sys = corresponds to b v — 6fid =1.0 while / sya = 
1.0 corresponds to b v — b v , alt = 1.1 — z/15. The cyan curve 
in Fig. [5] shows the required reduction of / sys for a given 
fcmax- For example, to extend the survey just out to the usu- 
ally conservative wavenumber fc max = 0.3 ft Mpc -1 , better- 
than-current knowledge of the velocity bias (/ sy s = 0.11 < 1) 
is required]^] 

Given that a biased 6 V value can lead to significant sys- 
tematic error, it is necessary to marginalize over b v to mit- 
igate the systematic bias. We find that marginalizing over 
an additional parameter b v in the Fisher matrix calculation 
does not significantly degrade the dark energy constraints; 
the statistical error \J a{w a )^{w p ) is increased by a factor of 
two at most. Since P(k) is sensitive to the change in b v (as 
shown in the last panel of Fig. [5j , it is not surprising that 
b v can be well constrained by data when set free. 

While the preparation of this manuscript was near com- 
pletion, we learned about the related work from |Linder fc] 
Samsing (2012 I. These authors have focused on a particular 
RSD model from |Kwan et al.|p012| and assess the impact 
of uncertainties in this model on cosmological constraints. 
These authors have found that, if the model parameters are 
fixed, they often require sub-per cent accuracy; on the other 
hand, if these model parameters are self-calibrated using the 
data, they do not significantly degrade the cosmological con- 
straints. This trend is consistent with our findings regarding 
fixing vs. marginalizing over the velocity bias. 

We emphasize that the main goal of this paper is to 
see to what extent the theoretical uncertainties associated 
with calibrating galaxy clustering using N-body simulations 
leads to errors in the cosmological parameters. Given the 
difficulty of predicting clustering beyond k ~ 0.5 h Mpc -1 
using purely theoretical methods (e.g., the perturbation the- 
ory), resorting to calibration with N-body simulations is re- 
quired, and this will remain to be the case for years to come. 
Our findings suggest that the velocity information of galax- 
ies predicted from N-body simulations is likely to generate 
biases. 



6 SUMMARY 

As the interpretation of the galaxy clustering measurements 
from deep, wide redshift surveys often relies on synthetic 
galaxy catalogs from N-body simulations, the systematic un- 
certainties in N-body simulations are likely to lead to sys- 
tematic errors in the cosmological results. In this paper, we 
have studies several theoretical uncertainties in the predic- 
tions of N-body simulations, including the statistics, spa- 
tial distribution, and the velocity dispersion of subhalos. In 



Note that 



Colin et al. 



1 2000 1 have shown that b v is scale- 



dependent. Since our scale-independent assumption has already 
introduced significant systematic shifts, we do not further con- 
sider the possible scale-dependence of velocity bias in this work 
but note that the possible scalc-dcpcndcncc will further compli- 
cate the systematic error. 
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particular, we have applied the halo model to calculate the 
galaxy power spectrum P(k), with inputs from recent N- 
body simulations. We have investigated how the uncertain- 
ties from these inputs impact the cosmological interpretation 
of P(k), and how well these systematics need to be controlled 
for future surveys. Our main findings can be summarized as 
follows: 

• We have found that the inclusion of the redshift-space 
distortion and the covariances between different fc-modes 
(the trispectrum contribution to the covariance matrix) is 
essential to accurately model the information content at 
small scale. 

• Uncertainties in predicting the halo concentration-mass 
relation, as well as the deviation from an NFW profile, are 
unlikely to be a dominant source of systematic error for 
fcmax < 1/iMpc -1 . 

• Possible deviation of P(N sa , t ) from the Poisson distri- 
bution, at its current uncertainty level (2 per cent) could be 
significant for fc max > 0.3 Mpc -1 . 

• Velocity bias is likely to be the most important source 
systematic error for upcoming surveys. The current uncer- 
tainty of 10 per cent at z — is likely to introduce 3 (5) per 
cent difference in P(k) for fc max = 0.3 (1) ftMpc -1 , thus lead- 
ing to a significant bias in cosmological parameters. Given 
its predominant role in the systematics, the velocity bias will 
need to be calibrated internally from the survey or externally 
with follow-up campaigns. 

The sensitivity of P(k) to velocity bias leads to the ques- 
tion of what can be done to alleviate the potential system- 
atic bias. Calibration through both observations and simu- 
lations is certainly one obvious solution. Another trick that 
is increasingly being used for large-scale structure surveys 
is to self-calibrate the systematic error(s); in the velocity- 
bias case, this would mean marginalizing over b v . With this 
marginalization, we expect to be left with vastly diminished 
biases and only a modest degradation in the cosmological 
parameters. We do not expect the bias to vanish completely, 
however, since second-order effects (e.g., redshift- and scale- 
dependence of b v ) will remain and will cause systematic 
shifts. Given that we currently do not have a good model 
of b v (z,k), we have not attempted the full self-calibration 
exercise, but we definitely expect this to be modus operandi 
of galaxy clustering analyses in the future. 
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Figure Al. Galaxy number density profile from the Consuelo 
simulation. The colored curves show galaxies in the simulation, 
while the grey dashed curves correspond to the observations of 
SDSS from |Tinker et al.| pTjT^ . As can be seen, the simulated 
galaxy population near the centre of clusters tend to have a shal- 
lower distribution than the real galaxy population. The difference 
is larger for more massive clusters. This figure is adapted from Wu 
et al. (in preparation). 
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APPENDIX A: GALAXY NUMBER DENSITY 
PROFILE 



Fig. I Al| presents the galaxy number density profile based on 
which we model its theoretical uncertainties. The colored 
curves are based on the Consuelo simulation - an N-body 
simulation with 1400 3 particles in a volume of side length 
420/i" 1 M Q . The mass resolution is 1.9 x 10 9 h 1 Mq, and 
the force resolution is 8/i~ 1 kpc. We assign each subhalo a 
luminosity value using the «}^ x -luminosity relation based on 
a subhalo abundance matching model (P. Behroozi, private 
communication), and v£i ax is the subhalo's peak maximum 
circular velocity in its history. 

We compare the simulated galaxy density profiles with 
the results from the SDSS maxBCG cluster catalog as pre- 



sented in Tinker et al. (20121. The gray dashed curves cor- 
respond to three of the richness bins of maxBCG. From 
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the Consuelo simulation, we select clusters in a way that 
they have approximately the same mass distribution as the 



maxBCG cluster sample (Johnston et al. 20071. At large 
radii (> 0.5/i _1 Mpc), the simulation and observation agree 
well. This agreement naturally comes from our mass selec- 
tion and abundance matching without tuning the normal- 
ization. 

However, discrepancy between simulation and observa- 
tion occurs at small radius. As can be seen, the subhalo 
number density profile measured from the simulation is shal- 
lower than the galaxy density profile measured from SDSS 
and is also shallower than the NFW profile. This discrep- 
ancy is stronger for more massive host haloes. Wu et al. (in 
preparation) further demonstrate that (f) the discrepancy 
is also stronger for dimmer galaxies, (2) the trend exists in 
several state-of-the-art N-body simulations using different 
algorithms and resolutions, and (3) the incompleteness of 
subhaloes depends on radius, mass of the host halo, and the 
mass of the subhalo. It has been shown that the deficit of 
simulated galaxies near the centre of massive haloes can be 
alleviated in hydrodynamical simulations that include cool- 



ing and star formation (e.g., Weinberg et al. 2008 Dolag 



et al. 2009). Therefore, it is highly likely that this deficit 



presents a fundamental limitation of N-body simulations and 
needs to be taken into account when we use N-body simu- 
lations to model the galaxy population in massive clusters. 



APPENDIX B: DERIVATION OF GALAXY 
POWER SPECTRUM 

In this appendix, we provide the detailed derivation of the 
galaxy power spectrum, mainly following the derivations in 



Scherrer fc Bertschinger| ( |1991| >, |Seljak| ( |2000[ ), and |Cooray 
& Sheth ( 2002 ) , in order to clarify possible confusions orig- 
inated from different conventions. Let us assume that dark 
matter halo i with mass Mi is located at X; . It has Ni galax- 
ies, whose spatial distribution is described by u(x — Xi|A/i) 

(normalized so that / d 3 x u(x|M) = 1). The galaxy number 



density field can be described by summing over all haloes in 
the universe: 



rigai(x) = y^iVi w(x - Xi|Mj) 

i 

= Yj dM5 o(M - Mi) J d 3 x'5 D (x' - Xi)AT iU (x - x'|M) 

(Bl) 

where we insert Dirac delta functions for M and x . If we 
define 



Y, 5n(M - Mi)5 D (x.' - x,) nA = n(M) (N\M) 



(B2) 

then the mean galaxy number density is given by 

n g ai = <n gal ) = f dMn{M) (N\M) , (B3) 



where we write n(M) — dn/dM for the halo mass function. 
The number density fluctuation of galaxies is defined as 



<5 ga i(x) = - I 



"gal 



(B4) 



The two-point statistics follows the definition: 

Y&d{Mi - M<)fo(xi -Xi)Ni^28 D (M 2 - Mj)S n (x 2 - xj)N } 

i j 

n(Mi) (JV|Afi> n(M 2 ) (N\M 2 ) [1 + £hh(Mi,M 2 , |x 2 - x x |)] (i / j) 



+ m U, i ( | 2 J \Mxj 5 D (Mx - M 2 )Mxi - xa) (» = j) , 

(B5) 

where ^hh is the two-point correlation function of dark mat- 
ter contributed by two different haloes. The two-point cor- 
relation function for galaxies reads 

CbbO) = (5 ga i(x)<5 ga i(x + r)) 
1 



'gal 



dM\ f dM 2 / d xi / d x 2 u(x - xi|Mi)«(x + r - x 2 |M 2 



x ( Y 6 D (Mi - Mi)fo(xi - Xi)Ni Y s d{M 2 - Mj)5 D (x2 - xj)JV, 



(B6) 



where 

£g(r)= 4- / dMn{M) {( N 2 )\M) f d 3 x u(x|M)«(x + r\M) , 

n g al J J 



Cg( r ) = ^r- I ' dMw{Mx) {N\Mx) f dM 2 n{M 2 ) (N\M 2 ) 



"gal 



d 3 xi / d 3 x 2 u(xi|Afi)M(x 2 -r|A/2)(I + ah(Mi,M 2 ;|x2-xi-r|)) 

(B7) 

We now turn to the Fourier space. We follow the con- 
vention of Fourier transform 

5(k) = JL I d 3 x 5(x) e -*.x 

The Dirac delta function in k-space is define as 



(B8) 



fo(k) = 



I f d 3 x _ lk . x 



(dimensionless) . (B9) 



V J (2tt) 3 

From this convention, the relation between correlation func- 
tion and power spectrum follows: 



e(r) = (2^)3 / d3 ^y 



(B10) 



The Fourier transform of the density perturbation reads 
<W( k ) = ~J= J d 3 * 5 gal(x) e~ !kx 



1 — r ^^^(klMOe-*-** - (2n) 3 W5 D (k) 



(BH) 



where 



u(k|M) = J d 3 xu(x|A/)e- lk x . (B12) 
Based on this definition, u — > 1 when k — > and is dimen- 



sionless. Applying Fourier transform to Equation (B5|, we 



© 0000 RAS, MNRAS 000, 000-000 



16 Wu and Huterer 



obtain 



]T fo(Afi - Mi)e- ikl ' Xi iVi ]T i B (Af a - M J )e +ik2 - x ^iV J 



= n(Mi) (N\Mi) n(M 2 ) (N\M 2 ) (2n) 6 V 2 S D (k 1 )S D (k 2 ) 

+ (2nfVn(M 1 ) (N\Mi) n(M 2 ) {N\M 2 ) Phh{Mx, Mr, k)5 D {\ti 

+ (27r) 3 1/n(M 1 ) ((^)|Mi)fo(Afi - M 2 )fo(ki - k 2 ) 

(B13) 

We note that under our convention of Fourier transform, 



where V s (ki) = 47rfcj 5 In k. Its covariance is 



P(ki)P(k s )) - <P(fe)) (Pfe) 

(2^) 3 P(fe) 2 



(C3) 



where 
k 2 ) 



/ 



d 3 kfc(k) = 1/V , and S D (k)S D (k) = fo(k)/(27r) 3 . 



We are now ready to compute the galaxy power spec- 
trum. Applying the trick of inserting Dirac delta functions 



and then using Equation (B13l, we obtain 
1 



p gg( k ) = (<Wk)S* al (k) 



(27T)W aI 



where the 2-halo term reads 



P^(k) = 



(B14) 



Pn n (fc) , (B15) 



and the 1-halo term reads 
dM 



P%(k) = ^-J dM-^(Q\M) f(k\M) . (B16) 

Here ({ N 2 )\M) f{k\M) is the galaxy pair-weighted profile, 
including the contribution from central and satellite galaxies 
( |Berlind fc Weinberg||2002 ) 



(( N 2 )\M) f(k\M) 



(iV sat |M) u(k\M) + - <iV sat (iV sat - 1)|M) \u{k\M)\ 2 



(B17) 



APPENDIX C: DERIVATION OF THE 
COVARIANCE MATRIX 

We now derive the covariance of power spectra at different 



wave numbers in Equation (171. First, recall the definitions 



for power spectrum and trispectrum: 

(5(k 1 )5(k 2 )> = (27r) 3 fc(k 12 )P(A ;i ) 

(5(k 1 )5(k 2 )<5(k 3 )(5(k4)> c = (27r) 3 fc(k 12 34)r(fci,fc 2 ,fc 3 ,fc 4 ) , 

(CI) 

where the subscript c indicates the "connected" term. Under 
our convention, [P] = L 3 and [T] = L 6 . For a given realiza- 
tion of the density field 5(k), the estimator of the binned 
power spectrum is 

P ^ = / VTl^ S M S (-V . (02) 
Jkt Vs{ki) 



T{hi , hj^j 



V V a (ki) 



d 3 ki 



+ T(k { , kj) 



d 3 k 2 



V 3 (ki) A, V.(fcj) 



r(ki,-ki,k 2l -k 2 ) 



(C4) 

Below we provide the derivation. The first term in Equation 
(|C3|) can be calculated as 



P(ki)P(k 3 ) 



d 3 ki 



d 3 k 2 



ki Vs(ki) J k V 3 (k 3 ) 



(<5(k 1 )5(-ki)5(k 2 )(5(-k 2 )) 
(C5) 



where the integrand reads: 

(tf(ki)*(-ki)*(ka)*(-ka)> 

= {S 1 dt5 2 6i) c + (5i5l) {5 2 5* 2 ) + (S 1 6 2 ) (ij^> + 

= (2 7 r) 3 <5(0)T(k 1 , -ki, k 2 , -k 2 ) (C6) 

+(2 7 r) 6 5(0)P(k 1 )5(0)P(k 1 ) (C7) 

+(27r) 6 «5(k 1 + k 2 )P(ki)5(k! + k 2 )P(ki) (08) 

+(27r) 6 «5(k 1 - ka)P(ki)J(ki - k 2 )P(ki) . (09) 

We note that <5d(0) = — . Then the contribution from 

(27T)' 3 

each term reads 

= T{hi , fcj ) 
dC7» => (P(h) )(P(k j ) 



(cancels the second term of Equation (C3l) 



|G8| = |C9l 



d 3 ki 



ki Vs{h) J k . V.%) 
d 3 ki 



(27r) 3 fo(k 1 -k 2 )P(k 1 )P(k 1 ) 



P(k 1 f(2nf / -^-fo(ki-ka) 



lki V.(ki) " ±J " ' J k . V.fa 
(only non-zero if ki — kj) 

(2^) 3 P(fcQ 2 
V V s {ki) 13 



(CIO) 

The expression of T(ki,kj) (Equation 181 can be obtained 



using Equation (Bill and is similar to the derivation of 
P(k). 
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